Skip to content

Fixed the Concurrency model#30

Merged
anaslimem merged 1 commit intomainfrom
fix_Concurrency
Mar 18, 2026
Merged

Fixed the Concurrency model#30
anaslimem merged 1 commit intomainfrom
fix_Concurrency

Conversation

@anaslimem
Copy link
Owner

Description

Updated the Concurrency model so now there won't be any missing stuff between the checkpoint and the WAL

Copilot AI review requested due to automatic review settings March 18, 2026 18:58
@vercel
Copy link

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cortexa-db Ready Ready Preview, Comment Mar 18, 2026 6:58pm

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates CortexaDBStore’s periodic checkpointing concurrency to avoid checkpoint/WAL inconsistencies that could lead to “gaps” during crash recovery.

Changes:

  • Capture state_machine and last_applied_id under a single writer-lock acquisition for checkpoint creation (avoids TOCTOU mismatch).
  • Pass a pre-captured wal_path into the checkpoint thread instead of re-locking solely to fetch it.
  • Update the checkpoint thread function signature/call sites accordingly.
Comments suppressed due to low confidence (1)

crates/cortexadb-core/src/store.rs:577

  • WriteAheadLog::truncate_prefix() rewrites+renames the WAL file, but it is currently executed without holding the writer lock. While this thread is truncating, other write operations can still append to the old WAL file handle; those appends will not be present in the rewritten WAL and can be lost when reopen_wal() swaps the handle. To avoid WAL corruption/data loss, perform WAL flush + truncation + reopen_wal() while holding the same writer lock (and flush buffers before truncation so buffered entries are included in the rewrite).
                    // `wal_path` was captured at thread-spawn time — no lock needed.
                    if let Err(err) =
                        WriteAheadLog::truncate_prefix(&wal_path, CommandId(last_applied_id))
                    {
                        log::error!("cortexadb WAL truncation error: {err}");
                    } else {
                        let mut write_guard = match writer.lock() {
                            Ok(g) => g,
                            Err(e) => {
                                log::error!("cortexadb checkpoint error while reopening WAL (lock poisoned): {e}");
                                continue;
                            }
                        };
                        if let Err(err) = write_guard.engine.reopen_wal() {
                            log::error!("cortexadb WAL reopen error: {err}");

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +544 to +559
let (checkpoint_state, last_applied_id) = match writer.lock() {
Ok(guard) => {
let id = guard.engine.last_applied_id().0;
let state = guard.engine.get_state_machine().clone();
(state, id)
}
Err(e) => {
log::error!("cortexadb checkpoint error (lock poisoned): {e}");
continue;
}
};
// Lock released — all I/O happens outside it.

if let Err(err) = save_checkpoint(
&checkpoint_path,
read_snapshot.state_machine(),
last_applied_id,
) {
if let Err(err) =
save_checkpoint(&checkpoint_path, &checkpoint_state, last_applied_id)
{
@anaslimem anaslimem merged commit bbb6dd8 into main Mar 18, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants